OpenVINO 2022.3实战六:NNCF 实现 YOLOv5 模型 INT8 量化

您所在的位置:网站首页 pytorch 推理dataloader爆内存 OpenVINO 2022.3实战六:NNCF 实现 YOLOv5 模型 INT8 量化

OpenVINO 2022.3实战六:NNCF 实现 YOLOv5 模型 INT8 量化

#OpenVINO 2022.3实战六:NNCF 实现 YOLOv5 模型 INT8 量化| 来源: 网络整理| 查看: 265

OpenVINO 2022.3实战六:NNCF 实现 YOLOv5 模型 INT8 量化 1 将YOLOv5模型转换为OpenVINO IR

使用OpenVINO模型优化器将YOLOv5模型转换为OpenVINO IR格式,以便在Intel硬件上进行推理。

下载yolov5代码 ultralytics/yolov5

python export.py --weights yolov5s.pt --include onnx

导出模型为onnx模型,接着使用mo导出openvino fp32和fp16模型

import nncf from openvino.tools import mo from openvino.runtime import serialize MODEL_NAME = "yolov5s" MODEL_PATH = f"weights/yolov5" onnx_path = f"{MODEL_PATH}/{MODEL_NAME}.onnx" # fp32 IR model fp32_path = f"{MODEL_PATH}/FP32_openvino_model/{MODEL_NAME}_fp32.xml" print(f"Export ONNX to OpenVINO FP32 IR to: {fp32_path}") model = mo.convert_model(onnx_path) serialize(model, fp32_path) # fp16 IR model fp16_path = f"{MODEL_PATH}/FP16_openvino_model/{MODEL_NAME}_fp16.xml" print(f"Export ONNX to OpenVINO FP16 IR to: {fp16_path}") model = mo.convert_model(onnx_path, compress_to_fp16=True) serialize(model, fp16_path) 2 准备数据集进行量化

将训练数据集准备成可用于量化的格式。

from openvino.yolov5_dataloader import create_dataloader from openvino.yolov5_general import check_dataset DATASET_CONFIG = "./data/coco128.yaml" def create_data_source(): """ Creates COCO 2017 validation data loader. The method downloads COCO 2017 dataset if it does not exist. """ data = check_dataset(DATASET_CONFIG) val_dataloader = create_dataloader( data["val"], imgsz=640, batch_size=1, stride=32, pad=0.5, workers=1 )[0] return val_dataloader data_source = create_data_source() # Define the transformation method. This method should take a data item returned # per iteration through the `data_source` object and transform it into the model's # expected input that can be used for the model inference. def transform_fn(data_item): # unpack input images tensor images = data_item[0] # convert input tensor into float format images = images.float() # scale input images = images / 255 # convert torch tensor to numpy array images = images.cpu().detach().numpy() return images # Wrap framework-specific data source into the `nncf.Dataset` object. nncf_calibration_dataset = nncf.Dataset(data_source, transform_fn) 3 配置量化管道

配置量化管道,例如选择适当的量化算法和设置目标精度。

在NNCF中,后训练量化管道由nncf.quantize函数(用于DefaultQuantization算法)和nncf.quantize_with_accuracy_control函数(用于AccuracyAwareQuantization算法)表示。量化参数preset, model_type, subset_size, fast_bias_correction, ignored_scope是函数参数。

subset_size = 300 preset = nncf.QuantizationPreset.MIXED 4 执行模型优化

对模型进行针对Intel硬件推理的优化,例如应用后训练量化或修剪技术。

from openvino.runtime import Core from openvino.runtime import serialize core = Core() ov_model = core.read_model(fp32_path) quantized_model = nncf.quantize( ov_model, nncf_calibration_dataset, preset=preset, subset_size=subset_size ) nncf_int8_path = f"{MODEL_PATH}/NNCF_INT8_openvino_model/{MODEL_NAME}_int8.xml" serialize(quantized_model, nncf_int8_path) 5 比较FP32, FP16和INT8模型的准确性

在验证数据集上比较FP32和INT8模型的准确性,以确定是否存在由于量化而导致的准确性损失。

from pathlib import Path from yolov5_val import run as validation_fn print("Checking the accuracy of the original model:") fp32_metrics = validation_fn( data=DATASET_CONFIG, weights=Path(fp32_path).parent, batch_size=1, workers=1, plots=False, device="cpu", iou_thres=0.65, ) fp32_ap5 = fp32_metrics[0][2] fp32_ap_full = fp32_metrics[0][3] print(f"[email protected] = {fp32_ap5}") print(f"[email protected]:.95 = {fp32_ap_full}") print("Checking the accuracy of the FP16 model:") fp16_metrics = validation_fn( data=DATASET_CONFIG, weights=Path(fp16_path).parent, batch_size=1, workers=1, plots=False, device="cpu", iou_thres=0.65, ) fp16_ap5 = fp16_metrics[0][2] fp16_ap_full = fp16_metrics[0][3] print(f"[email protected] = {fp16_ap5}") print(f"[email protected]:.95 = {fp16_ap_full}") print("Checking the accuracy of the NNCF int8 model:") int8_metrics = validation_fn( data=DATASET_CONFIG, weights=Path(nncf_int8_path).parent, batch_size=1, workers=1, plots=False, device="cpu", iou_thres=0.65, ) nncf_int8_ap5 = int8_metrics[0][2] nncf_int8_ap_full = int8_metrics[0][3] print(f"[email protected] = {nncf_int8_ap5}") print(f"[email protected]:.95 = {nncf_int8_ap_full}")

输出:

Checking the accuracy of the original model: [email protected] = 0.7064319945599192 [email protected]:.95 = 0.4716138340017886 Checking the accuracy of the FP16 model: [email protected] = 0.7064771913549115 [email protected]:.95 = 0.47165677301239517 Checking the accuracy of the NNCF int8 model: [email protected] = 0.6900523281577972 [email protected]:.95 = 0.45860702355897537 6 比较FP32, FP16和INT8模型的性能

比较FP32, FP16和INT8模型的性能,例如测量推理时间和内存使用情况。

benchmark_app -m weights/yolov5/FP32_openvino_model/yolov5s_fp32.xml -d CPU -api async -t 15 benchmark_app -m weights/yolov5/FP16_openvino_model/yolov5s_fp16.xml -d CPU -api async -t 15 benchmark_app -m weights/yolov5/FP32_openvino_model/yolov5s_fp32.xml -d CPU -api async -t 15

输出:

Inference FP32 model (OpenVINO IR) on CPU: [Step 11/11] Dumping statistics report [ INFO ] Count: 2504 iterations [ INFO ] Duration: 15067.63 ms [ INFO ] Latency: [ INFO ] Median: 47.65 ms [ INFO ] Average: 47.99 ms [ INFO ] Min: 40.73 ms [ INFO ] Max: 74.31 ms [ INFO ] Throughput: 166.18 FPS Inference FP16 model (OpenVINO IR) on CPU: [Step 11/11] Dumping statistics report [ INFO ] Count: 2536 iterations [ INFO ] Duration: 15069.53 ms [ INFO ] Latency: [ INFO ] Median: 47.11 ms [ INFO ] Average: 47.38 ms [ INFO ] Min: 38.03 ms [ INFO ] Max: 65.95 ms [ INFO ] Throughput: 168.29 FPS Inference NNCF INT8 model (OpenVINO IR) on CPU: [Step 11/11] Dumping statistics report [ INFO ] Count: 7872 iterations [ INFO ] Duration: 15113.06 ms [ INFO ] Latency: [ INFO ] Median: 61.17 ms [ INFO ] Average: 61.23 ms [ INFO ] Min: 52.75 ms [ INFO ] Max: 93.93 ms [ INFO ] Throughput: 520.87 FPS


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3